In-place Update of Suffix Array while Recoding Words

نویسندگان

  • Matthias Gallé
  • Pierre Peterlongo
  • François Coste
چکیده

Motivated by grammatical inference and data compression applications, we propose an algorithm to update a suffix array after the substitution, in the indexed text, of some occurrences of a given word by a new character. Compared to other published index update methods, the problem addressed here may require the modification of a large number of distinct positions over the original text. The proposed algorithm uses the specific internal order of suffix arrays in order to update simultaneously groups of entries, and ensures that only entries to be modified are visited. Experiments confirm a significant execution time speed-up compared to the construction of suffix array from scratch at each step of the application.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In-Place Longest Common Extensions

Longest Common Extension (LCE) queries are a fundamental sub-routine in many stringprocessing algorithms, including (but not limited to) suffix-sorting, string matching, and identification of palindrome factors and repeats. A LCE query takes as input two positions i, j in a text T ∈ Σ and returns the length l of the longest common prefix between T ’s i-th and j-th suffixes. It is clear that we ...

متن کامل

Optimal Time and Space Construction of Suffix Arrays and LCP Arrays for Integer Alphabets

Suffix arrays and LCP arrays are one of the most fundamental data structures widely used for various kinds of string processing. Many problems can be solved efficiently by using suffix arrays, or a pair of suffix arrays and LCP arrays. In this paper, we consider two problems for a string of length N , the characters of which are represented as integers in [1, . . . , σ] for 1 ≤ σ ≤ N ; the stri...

متن کامل

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

We consider the problem of encoding a string of length n from an alphabet [0, σ − 1] so that access and substring-equality queries (that is, determining the equality of any two substrings) can be answered efficiently. A clear lower bound on the size of any prefix-free encoding of this kind is n log σ + Θ(log(nσ)) bits. We describe a new encoding matching this lower bound when σ ≤ nO(1) while su...

متن کامل

Optimal In-Place Suffix Sorting

The suffix array is a fundamental data structure for many applications that involve string searching and data compression. Designing time/space-efficient suffix array construction algorithms has attracted significant attentions and considerable advances have been made in the last 20 years. We obtain the suffix array construction algorithms that are optimal both in time and space for both intege...

متن کامل

Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length n can be represented as an array of length n words, or, in the presence of the SA, as a bit vector of 2n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008